Introduction

Understanding crime dynamics is essential for effective public safety strategies. In recent years, researchers have increasingly explored how external factors, such as weather conditions, may influence criminal activity. This report examines the role of climatic variables on crime rates in Colchester, Essex, during the year 2024.

Overview of the Datasets

This investigation uses two datasets: crime24.csv, which includes monthly street-level crime incidents in Colchester (e.g., crime type, location, outcome), and temp24.csv, which provides daily weather data such as temperature, humidity, pressure, and precipitation from a nearby station.

Aim and Objectives

The analysis begins by independently exploring both crime and climate datasets to identify seasonal, temporal, and geographic patterns. These datasets are then merged to examine how fluctuations in temperature, humidity, or pressure align with changes in crime volume or type. A variety of visualisation techniques, including time series, scatter plots, and spatial mapping, are applied to highlight key patterns and associations. The ultimate goal is to derive practical, actionable recommendations for local law enforcement based on data-informed insights.

Research Question

Are crime levels influenced by weather in Colchester?

To answer this, we’ll perform: - Exploratory data analysis and visualisation of both datasets - Time series analysis and smoothing - Mapping and spatial insight using Leaflet - Correlation and clustering techniques to explore links between climate and crime

Through this investigation, we aim to uncover trends that may assist in strategic policing, crime prevention, and public awareness campaigns tailored to seasonal or climatic patterns.


Data Preparation

### Load required packages
library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(ggplot2)
library(lubridate)
## 
## Attaching package: 'lubridate'
## The following objects are masked from 'package:base':
## 
##     date, intersect, setdiff, union
library(readr)
library(forcats)
library(leaflet)
library(plotly)
## 
## Attaching package: 'plotly'
## The following object is masked from 'package:ggplot2':
## 
##     last_plot
## The following object is masked from 'package:stats':
## 
##     filter
## The following object is masked from 'package:graphics':
## 
##     layout
library(tidyr)
library(reshape2)
## 
## Attaching package: 'reshape2'
## The following object is masked from 'package:tidyr':
## 
##     smiths
library(patchwork)
library(viridis)
## Loading required package: viridisLite
library(gganimate)

Load the datasets

# Load the crime dataset from CSV file
crime24 <- read_csv("crime24.csv", show_col_types = FALSE)
## New names:
## • `` -> `...1`
# Display the first few rows of the dataset
as.data.frame(head(crime24))
##   ...1              category persistent_id    date      lat     long street_id
## 1    1 anti-social-behaviour          <NA> 2024-01 51.89301 0.901028   2153130
## 2    2 anti-social-behaviour          <NA> 2024-01 51.88979 0.898830   2153105
## 3    3 anti-social-behaviour          <NA> 2024-01 51.89825 0.902107   2153147
## 4    4 anti-social-behaviour          <NA> 2024-01 51.87837 0.888373   2152856
## 5    5 anti-social-behaviour          <NA> 2024-01 51.87905 0.889521   2152871
## 6    6 anti-social-behaviour          <NA> 2024-01 51.88860 0.899203   2153107
##                               street_name context        id location_type
## 1                  On or near Middle Mill      NA 115967607         Force
## 2 On or near Conference/exhibition Centre      NA 115967129         Force
## 3                   On or near Mason Road      NA 115967591         Force
## 4              On or near Kensington Road      NA 115967062         Force
## 5                 On or near Lambeth Road      NA 115967058         Force
## 6               On or near Trinity Street      NA 115967547         Force
##   location_subtype outcome_status
## 1             <NA>           <NA>
## 2             <NA>           <NA>
## 3             <NA>           <NA>
## 4             <NA>           <NA>
## 5             <NA>           <NA>
## 6             <NA>           <NA>
# Load the temperature dataset from CSV files
temp24 <- read_csv("temp24.csv", show_col_types = FALSE)

# Display the first few rows of the dataset
as.data.frame(head(temp24))
##   station_ID       Date TemperatureCAvg TemperatureCMax TemperatureCMin TdAvgC
## 1       3590 2024-12-31             6.5             7.7             5.0    4.4
## 2       3590 2024-12-30             5.6             6.9             3.4    4.9
## 3       3590 2024-12-29             3.3             4.9             2.2    3.2
## 4       3590 2024-12-28             4.0             5.8             2.3    3.7
## 5       3590 2024-12-27             5.3             6.7             4.3    5.1
## 6       3590 2024-12-26             6.7            10.0             5.6    6.4
##   HrAvg WindkmhDir WindkmhInt WindkmhGust PresslevHp Precmm TotClOct lowClOct
## 1  86.4        WSW       22.7        42.6     1025.3    0.0      4.5      7.2
## 2  94.9        WSW       16.7        40.8     1028.5    0.0      8.0      8.0
## 3  98.6          W       11.4        22.2     1028.5    0.4      8.0      8.0
## 4  98.4         SW        5.5        14.8     1031.8    0.4      8.0      8.0
## 5  98.4          S        6.3        16.7     1034.7    0.4      8.0      8.0
## 6  98.3        WSW        9.3        22.2     1033.6    0.4      8.0      8.0
##   SunD1h VisKm SnowDepcm PreselevHp
## 1    5.7  63.4        NA         NA
## 2    0.0  15.3        NA         NA
## 3    0.0   0.5        NA         NA
## 4    0.0   0.1        NA         NA
## 5    0.0   0.5        NA         NA
## 6    0.0   0.2        NA         NA

Convert date format and extract useful time features

# Fix incomplete date format by adding a day to form valid dates (e.g., "2024-01" -> "01-2024-01")
crime24$date <- paste0("01-", crime24$date)  
crime24$date <- as.Date(crime24$date, format = "%d-%Y-%m")

# Extract full month name from the date
crime24$month <- month(crime24$date, label = TRUE, abbr = FALSE)

# Assign a season based on the month number
crime24$season <- case_when(
  month(crime24$date) %in% c(12, 1, 2) ~ "Winter",   # December–February
  month(crime24$date) %in% c(3, 4, 5) ~ "Spring",    # March–May
  month(crime24$date) %in% c(6, 7, 8) ~ "Summer",    # June–August
  month(crime24$date) %in% c(9, 10, 11) ~ "Autumn"   # September–November
)

# Remove unused columns with no analytical value
crime24 <- crime24 %>% select(-c(context, location_subtype))

# Replace missing values in outcome_status with "Unknown"
crime24$outcome_status <- ifelse(is.na(crime24$outcome_status), "Unknown", crime24$outcome_status)

# Display a count of missing values by column
colSums(is.na(crime24))
##           ...1       category  persistent_id           date            lat 
##              0              0            732              0              0 
##           long      street_id    street_name             id  location_type 
##              0              0              0              0              0 
## outcome_status          month         season 
##              0              0              0

Clean and prepare temperature dataset

# Convert the 'Date' column to proper Date format
temp24$Date <- as.Date(temp24$Date)

# Extract the full month name from the date
temp24$month <- month(temp24$Date, label = TRUE, abbr = FALSE)

# Assign a season based on the month number
temp24$season <- case_when(
  month(temp24$Date) %in% c(12, 1, 2) ~ "Winter",   # December–February
  month(temp24$Date) %in% c(3, 4, 5) ~ "Spring",    # March–May
  month(temp24$Date) %in% c(6, 7, 8) ~ "Summer",    # June–August
  month(temp24$Date) %in% c(9, 10, 11) ~ "Autumn"   # September–November
)

# Remove unnecessary columns with excessive missing values
temp24 <- temp24 %>% select(-c(PreselevHp, SnowDepcm))

# Replace missing values in rainfall and low cloud cover columns with 0
temp24$Precmm[is.na(temp24$Precmm)] <- 0
temp24$lowClOct[is.na(temp24$lowClOct)] <- 0

# Display a count of remaining missing values by column
colSums(is.na(temp24))
##      station_ID            Date TemperatureCAvg TemperatureCMax TemperatureCMin 
##               0               0               0               0               0 
##          TdAvgC           HrAvg      WindkmhDir      WindkmhInt     WindkmhGust 
##               0               0               0               0               0 
##      PresslevHp          Precmm        TotClOct        lowClOct          SunD1h 
##               0               0               0               0               0 
##           VisKm           month          season 
##               0               0               0

Exploratory Data Analysis (EDA)

Crime Category Distribution

# Create a bar plot showing the distribution of crime categories, ordered by frequency
crime_category_plot <- ggplot(crime24, aes(x = fct_infreq(category), fill = category)) +
  geom_bar(color = "black") +  # Add black borders to bars
  labs(title = "Distribution of Crime Categories",
       x = "Crime Category", y = "Frequency") +
  theme_minimal() +  # Apply a clean, minimal theme
  theme(axis.text.x = element_text(angle = 45, hjust = 1))  # Rotate x-axis labels for readability

# Convert the static ggplot to an interactive plotly version
ggplotly(crime_category_plot)

The interactive bar plot shows that violent crime is the most frequently reported offence in Colchester, followed by anti-social behaviour. Other crimes like shoplifting and criminal damage occur moderately, while robbery and possession of weapons are less common. Overall, violence and anti-social behaviour dominate the crime profile in 2024.

Outcome Status of Crimes

# Create a bar plot showing the distribution of crime outcome statuses, ordered by frequency
outcome_plot <- ggplot(crime24, aes(x = fct_infreq(outcome_status), fill = outcome_status)) +
  geom_bar(color = "black") +  # Add black outline around bars
  labs(title = "Outcome Status of Reported Crimes",
       x = "Outcome Status", y = "Count") +
  theme_minimal() +  # Use a clean minimal theme
  theme(
    axis.text.x = element_text(angle = 70, hjust = 1),  # Tilt x-axis labels for better readability
    axis.title = element_text(size = 12),               # Set font size for axis titles
    plot.title = element_text(size = 16, face = "bold") # Make plot title larger and bold
  ) +
  scale_fill_manual(values = rainbow(length(unique(crime24$outcome_status)))) +  # Use a distinct color for each outcome
  guides(fill = guide_legend(title = "Outcome Status"))  # Add a legend title

# Convert the static plot to an interactive plotly version
ggplotly(outcome_plot)

The bar plot reveals that most reported crimes result in no suspect being identified, followed by cases where suspects cannot be prosecuted. A notable portion of cases remains under investigation or await court outcomes, while fewer end in formal charges or other resolutions. This suggests that many crimes in Colchester do not lead to prosecution, highlighting challenges in investigation and legal follow-through.

Crime by Location Type

# Create a bar chart showing the number of crimes by location type
location_plot <- ggplot(crime24, aes(x = location_type, fill = location_type)) +
  geom_bar(color = "black") +  # Add black borders to each bar
  labs(title = "Crimes by Location Type",
       x = "Location Type", y = "Number of Crimes") +
  theme_minimal()  # Apply a clean, minimal visual theme

# Convert the ggplot chart into an interactive Plotly version
ggplotly(location_plot)

The bar chart highlights that nearly all crimes are reported under the local police authority (“Force”), with very few linked to the British Transport Police (“BTP”). This suggests that most criminal incidents occur in community settings rather than transport environments, emphasizing the primary role of local policing in managing crime.

Top 10 Crime Hotspot Streets

# Identify the top 10 streets with the highest number of crimes
top_streets <- crime24 %>%
  group_by(street_name) %>%
  summarise(crime_count = n()) %>%
  arrange(desc(crime_count)) %>%
  slice_head(n = 10)

# Filter the original crime dataset to include only those top 10 streets
crime_top <- crime24 %>% filter(street_name %in% top_streets$street_name)

# Create a stacked bar plot showing crime type distribution across the top 10 streets
hotspot_plot <- ggplot(crime_top, aes(x = street_name, fill = category)) +
  geom_bar() +  # Plot counts by street and crime category
  labs(title = "Crime Type Distribution in Top 10 Streets",
       x = "Street", y = "Number of Crimes") +
  scale_fill_viridis_d(option = "plasma") +  # Apply a visually appealing color palette
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))  # Tilt x-axis labels for readability

# Make the chart interactive with Plotly
ggplotly(hotspot_plot)

The bar plot shows that “On or near Supermarket” and “On or near Shopping Area” are the top crime hotspots, largely driven by shoplifting. In contrast, locations like nightclubs and police stations show a more diverse mix of crimes, including violent offences and public disorder. This suggests that crime patterns vary by location, with retail areas experiencing targeted offences and other zones showing broader criminal activity.

Spatial Mapping of Crime (Leaflet Map)

# Create an interactive leaflet map to display crime locations across Colchester
leaflet(crime24) %>%
  addTiles() %>%  # Add default OpenStreetMap tile layer as the base map
  addCircleMarkers(
    lng = ~long, lat = ~lat,       # Use longitude and latitude for marker placement
    radius = 3,                    # Set marker size
    color = "blue",                # Use blue for marker color
    stroke = FALSE,                # Remove border around circles
    fillOpacity = 0.6              # Set transparency for better visibility
  ) %>%
  setView(                         # Center the map view on the average location of all crimes
    lng = mean(crime24$long, na.rm = TRUE),
    lat = mean(crime24$lat, na.rm = TRUE),
    zoom = 13                      # Set zoom level for city-scale view
  )

The interactive leaflet map reveals that central Colchester, particularly around major roads and commercial zones, is the main crime hotspot. Areas like Cymbeline Way and the town centre show the highest concentration of incidents, while outer residential areas see far fewer. This indicates that crime is closely tied to high-traffic, public-facing spaces.

Monthly Crime Trend Over Time

# Ensure the 'date' column is in proper Date format
crime24$date <- as.Date(crime24$date)

# Create a new variable representing month and year (e.g., "2024-03")
crime24$month_year <- format(crime24$date, "%Y-%m")

# Group the dataset by month and count the number of crimes in each
monthly_crimes <- crime24 %>%
  group_by(month_year) %>%
  summarise(total_crimes = n()) %>%
  mutate(month_year = as.Date(paste0(month_year, "-01")))  # Convert to Date for plotting

# Generate a line plot showing monthly crime trends, with a loess smoothing line
monthly_trend_plot <- ggplot(monthly_crimes, aes(x = month_year, y = total_crimes)) +
  geom_line(color = "steelblue", size = 1) +  # Line for monthly crime totals
  geom_smooth(method = "loess", se = FALSE, color = "darkred", linetype = "dashed") +  # Smoothed trend line
  labs(title = "Monthly Crime Trend in Colchester (2024)",
       x = "Month", y = "Number of Crimes") +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))  # Rotate x-axis labels
## Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
## ℹ Please use `linewidth` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
# Convert to an interactive Plotly plot
ggplotly(monthly_trend_plot)
## `geom_smooth()` using formula = 'y ~ x'

The line chart shows that crime in Colchester peaked mid-year, with the highest levels in July, and declined toward the end of 2024. Early months like January and February also saw elevated activity, while April marked a low point. The overall trend suggests a seasonal pattern, with increased crime during warmer months, possibly due to higher public activity.

Crime Count Across Seasons

# Group crime data by season and count total crimes in each
seasonal_crimes <- crime24 %>%
  group_by(season) %>%
  summarise(total_crimes = n()) %>%
  arrange(desc(total_crimes))  # Order seasons by crime count (descending)

# Create a bar plot showing the number of crimes in each season
season_plot <- ggplot(seasonal_crimes, aes(x = season, y = total_crimes, fill = season)) +
  geom_bar(stat = "identity", color = "black") +  # Use black borders on bars
  scale_fill_manual(values = c("Winter" = "#4682B4", "Spring" = "#90EE90",
                               "Summer" = "#FFD700", "Autumn" = "#FF8C00")) +  # Assign custom colors
  labs(title = "Crime Count Across Seasons (2024)",
       x = "Season", y = "Number of Crimes") +
  theme_minimal()  # Apply a clean visual style

# Convert to an interactive plot using Plotly
ggplotly(season_plot)

The bar chart shows that Summer had the highest crime levels in Colchester during 2024, followed by Autumn, while Spring recorded the fewest crimes. This suggests that crime slightly increases during warmer months, likely due to greater public activity, and decreases in colder seasons. Overall, there is a modest seasonal pattern in crime rates.

Heatmap of Crime Categories Across Seasons

# Count the number of crimes grouped by both season and category
crime_season_heatmap <- crime24 %>%
  group_by(season, category) %>%
  summarise(crime_count = n(), .groups = 'drop')  # Drop grouping after summarising

# Create a heatmap to show the distribution of crime types across seasons
heatmap_plot <- ggplot(crime_season_heatmap, aes(x = season, y = category, fill = crime_count)) +
  geom_tile(color = "white") +  # Draw rectangles with white borders
  scale_fill_gradient(low = "#ffffcc", high = "#006837", name = "Crime Count") +  # Color gradient
  labs(title = "Crime Types by Season in Colchester (2024)",
       x = "Season", y = "Crime Category") +
  theme_minimal() +
  theme(
    axis.text.x = element_text(angle = 45, hjust = 1),         # Rotate x-axis labels
    plot.title = element_text(size = 16, face = "bold"),       # Style the title
    legend.position = "right"                                  # Position the legend
  )

# Convert the heatmap to an interactive version
ggplotly(heatmap_plot)

The heatmap shows that violent crime is consistently the most reported offence across all seasons in Colchester, with slight peaks in summer and winter. Other common crimes like anti-social behaviour and shoplifting occur steadily year-round, while less frequent crimes show little seasonal change. Overall, crime patterns remain fairly stable throughout the year, with only modest seasonal variation.

Exploring Weather’s Impact on Crime Rates

Merging crime24.csv & temp24.csv

# Format dates for merging: extract "YYYY-MM" from each date
crime24$merge_date <- format(crime24$date, "%Y-%m")
temp24$merge_date <- format(temp24$Date, "%Y-%m")

# Aggregate weather data by month: calculate mean values for each variable
temp_monthly <- temp24 %>%
  group_by(merge_date) %>%
  summarise(
    avg_temp = mean(TemperatureCAvg, na.rm = TRUE),
    max_temp = mean(TemperatureCMax, na.rm = TRUE),
    min_temp = mean(TemperatureCMin, na.rm = TRUE),
    humidity = mean(HrAvg, na.rm = TRUE),
    pressure = mean(PresslevHp, na.rm = TRUE),
    precipitation = mean(Precmm, na.rm = TRUE)
  )

# Aggregate total number of crimes by month
crime_monthly <- crime24 %>%
  group_by(merge_date) %>%
  summarise(
    total_crimes = n()
  )

# Merge crime and weather datasets by month
weather_crime <- left_join(crime_monthly, temp_monthly, by = "merge_date")

# View the structure of the merged dataset
as.data.frame(head(weather_crime))
##   merge_date total_crimes  avg_temp  max_temp  min_temp humidity pressure
## 1    2024-01          529  4.251613  7.348387 0.7419355 83.11935 1015.652
## 2    2024-02          546  7.682759 11.041379 4.0482759 87.22759 1009.383
## 3    2024-03          502  8.135484 11.441935 4.5258065 83.28710 1005.510
## 4    2024-04          471  9.083333 13.393333 4.5500000 78.56333 1012.100
## 5    2024-05          568 13.396774 18.277419 8.4645161 82.49032 1012.271
## 6    2024-06          490 14.323333 19.670000 7.7733333 73.88667 1014.133
##   precipitation
## 1      1.735484
## 2      3.193103
## 3      1.883871
## 4      1.813333
## 5      2.600000
## 6      0.840000

Correlation Between Crime and Weather Variables

# Remove the merge_date column and compute a correlation matrix for numeric variables
cor_matrix <- cor(weather_crime[, -1], use = "complete.obs")

# Reshape the correlation matrix into long format for plotting
cor_melted <- melt(cor_matrix)

# Round correlation values to two decimal places for labels
cor_melted$label <- round(cor_melted$value, 2)

# Create an interactive correlation heatmap using Plotly
crime_weather_corr_plot <- plot_ly(
  data = cor_melted,
  x = ~Var1,
  y = ~Var2,
  z = ~value,
  type = "heatmap",
  text = ~label,                      # Display correlation values as text
  texttemplate = "%{text}",           # Format text appearance
  textfont = list(color = "black", size = 12),
  hoverinfo = "text",                 # Show only text on hover
  colorscale = "YlGnBu",              # Apply Yellow-Green-Blue sequential color scale
  zmin = -1, zmax = 1,                # Set color scale limits
  colorbar = list(title = "Correlation")
) %>%
  layout(
    title = "Correlation Between Crime & Weather Variables",
    xaxis = list(title = "", tickangle = -45),
    yaxis = list(title = "")
  )


# Render the interactive correlation heatmap
crime_weather_corr_plot

The correlation matrix reveals that crime in Colchester during 2024 is moderately positively correlated with temperature and precipitation, suggesting higher crime rates during warmer and wetter conditions. In contrast, humidity and pressure show weak negative correlations, indicating little to no influence on crime levels. Overall, temperature and rainfall appear to have the strongest links to crime trends.

Scatter Plots: Weather vs Crime

# Plot 1: Relationship between average temperature and total crimes
p1 <- ggplot(weather_crime, aes(x = avg_temp, y = total_crimes)) +
  geom_point(color = "#E67E22", size = 3) +  # Orange points
  geom_smooth(method = "lm", color = "red", se = FALSE) +  # Linear regression line
  labs(
    title = "Avg Temperature vs Crime",
    x = "Average Temperature (°C)",
    y = "Total Crimes"
  ) +
  theme_minimal()

# Plot 2: Relationship between average humidity and total crimes
p2 <- ggplot(weather_crime, aes(x = humidity, y = total_crimes)) +
  geom_point(color = "#3498DB", size = 3) +  # Blue points
  geom_smooth(method = "lm", color = "red", se = FALSE) +  # Linear regression line
  labs(
    title = "Humidity vs Crime",
    x = "Average Humidity (%)",
    y = "Total Crimes"
  ) +
  theme_minimal()

# Plot 3: Relationship between average pressure and total crimes
p3 <- ggplot(weather_crime, aes(x = pressure, y = total_crimes)) +
  geom_point(color = "#2ECC71", size = 3) +  # Green points
  geom_smooth(method = "lm", color = "red", se = FALSE) +  # Linear regression line
  labs(
    title = "Pressure vs Crime",
    x = "Average Pressure (hPa)",
    y = "Total Crimes"
  ) +
  theme_minimal()

# Convert each ggplot object into interactive Plotly plots
p1_plot <- ggplotly(p1)
## `geom_smooth()` using formula = 'y ~ x'
p2_plot <- ggplotly(p2)
## `geom_smooth()` using formula = 'y ~ x'
p3_plot <- ggplotly(p3)
## `geom_smooth()` using formula = 'y ~ x'
# Combine the three interactive plots into a single row layout
subplot(p1_plot, p2_plot, p3_plot, nrows = 1, margin = 0.05, titleX = TRUE, titleY = TRUE) %>%
  layout(
    title = list(
      text = "Weather Factors and Their Relationship with Crime",
      x = 0.5,              # Center the main title
      xanchor = "center",
      y = 0.95
    )
  )

The scatter plots show that crime in Colchester tends to rise with higher temperatures, suggesting a positive link between warmth and criminal activity. In contrast, humidity and pressure show slight negative correlations with crime, though these relationships are weaker. Overall, temperature appears to be the strongest weather-related factor associated with monthly crime patterns in 2024.

Crime on Wet vs Dry Months

# Categorize each month as "Wet" or "Dry" based on average monthly precipitation
weather_crime$rain_type <- ifelse(weather_crime$precipitation >= 1, "Wet", "Dry")

# Create a boxplot comparing total crime counts between wet and dry months
wet_dry_crime_boxplot <- ggplot(weather_crime, aes(x = rain_type, y = total_crimes, fill = rain_type)) +
  geom_boxplot() +  # Display distribution, median, and range of crimes per group
  scale_fill_manual(values = c("Wet" = "#1F78B4", "Dry" = "#FFD700")) +  # Blue for wet, gold for dry
  labs(
    title = "Crime Comparison: Wet vs Dry Months",
    x = "Month Type", y = "Total Crimes"
  ) +
  theme_minimal()  # Apply a clean minimal theme

# Convert the static boxplot into an interactive Plotly version
ggplotly(wet_dry_crime_boxplot)

The boxplot compares crime levels between wet and dry months in Colchester for 2024. It shows that wet months tend to have slightly higher crime counts on average and a wider range of variation. This pattern suggests a modest association between rainfall and increased crime, possibly due to more social activity or weather-related disruptions during wetter periods.

Average Temperature Range vs Crime Types

# Join crime and weather datasets by merge_date (month), allowing multiple matches
combined <- left_join(crime24, temp24, by = "merge_date", relationship = "many-to-many")

# Categorize temperature into three ranges using quantiles: Low, Medium, High
combined$temp_range <- cut(
  combined$TemperatureCAvg,
  breaks = quantile(combined$TemperatureCAvg, probs = c(0, 0.33, 0.66, 1), na.rm = TRUE),
  labels = c("Low", "Medium", "High"),
  include.lowest = TRUE
)

# Create a proportional bar plot showing crime category distribution by temperature range
crime_by_temp_range_plot <- ggplot(combined, aes(x = temp_range, fill = category)) +
  geom_bar(position = "fill") +  # Fill bars proportionally by category
  labs(
    title = "Crime Type Distribution by Temperature Range",
    x = "Temperature Range", y = "Proportion",
    fill = "Crime Category"
  ) +
  scale_fill_viridis_d(option = "magma") +  # Apply a perceptually uniform color palette
  theme_minimal()  # Use a clean visual theme

# Convert the static plot to an interactive Plotly version
ggplotly(crime_by_temp_range_plot)

The chart shows that while most crime types occur consistently across temperature ranges, anti-social behaviour and public order offences are slightly more common in warmer conditions. In contrast, crimes like vehicle crime and shoplifting remain steady regardless of temperature. This suggests that some crime types may be more influenced by temperature than others.

Humidity Range vs Crime Types

# Categorize humidity into two bins: Low and High based on median split
combined$humidity_range <- cut(
  combined$HrAvg,
  breaks = quantile(combined$HrAvg, probs = c(0, 0.5, 1), na.rm = TRUE),
  labels = c("Low", "High"),
  include.lowest = TRUE
)

# Create a violin plot to visualize humidity distribution across crime categories
humidity_crime_violin_plot <- ggplot(combined, aes(x = category, y = HrAvg, fill = humidity_range)) +
  geom_violin(scale = "width", trim = FALSE, color = "black") +  # Violin plot with uniform width
  labs(
    title = "Distribution of Humidity by Crime Category",
    x = "Crime Category", y = "Humidity (%)"
  ) +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))  # Rotate x-axis labels for readability

# Convert the violin plot to an interactive Plotly chart
ggplotly(humidity_crime_violin_plot)

The violin plot shows that most crime types occur under both low and high humidity, but slightly more crimes—especially violent and public order offences—are associated with higher humidity levels. This suggests a minor trend of increased crime during more humid conditions.

Visual Analytics-Driven Approaches to Advanced Clustering Techniques

Interactive Leaflet Map with Clustering

# Filter out rows with missing latitude or longitude to ensure valid map points
crime_map_data <- crime24 %>% filter(!is.na(lat), !is.na(long))

# Create an interactive leaflet map with marker clustering for crime incidents
leaflet(crime_map_data) %>%
  addTiles() %>%  # Add default OpenStreetMap tile layer
  addMarkers(
    lng = ~long, lat = ~lat,  # Use coordinates from dataset
    clusterOptions = markerClusterOptions(),  # Enable clustering of nearby markers
    popup = ~paste("Crime:", category, "<br>", "Street:", street_name)  # Display info on click
  ) %>%
  setView(
    lng = mean(crime_map_data$long, na.rm = TRUE),  # Center the map on the average longitude
    lat = mean(crime_map_data$lat, na.rm = TRUE),   # Center the map on the average latitude
    zoom = 13  # Set zoom level for detailed town view
  )

The interactive leaflet map with clustering provides a spatial overview of crime hotspots across Colchester in 2024. Each numbered circle represents a cluster of reported crimes in a specific geographic area, with the number indicating the total incidents in that location. Larger circles and warmer colors (orange/red) reflect areas with higher concentrations of crime.

From the map, it’s clear that the town center—particularly the vicinity around Colchester Town Station and Southway—emerges as a primary hotspot, with cluster counts exceeding 1,100 crimes in some zones. Other significant concentrations appear along Butt Road, Hythe Hill, and near St Nicholas Street, suggesting these may be high-traffic or densely populated areas. In contrast, peripheral areas such as Cymbeline Meadows or Abbey Field show much lower crime frequencies, with some clusters having fewer than 10 incidents.

In summary, this clustered map helps visualize how crime is spatially distributed, highlighting urban centers as key hotspots. It enables authorities and policymakers to target specific areas for surveillance, prevention, or community engagement efforts.

K-Means Clustering of Crime Hotspots

# Prepare data for K-means clustering
crime_k <- crime_map_data %>%
  select(lat, long) %>%
  na.omit()

# Run K-means clustering (4 clusters here, adjust as needed)
set.seed(123)
crime_kmeans <- kmeans(crime_k, centers = 4)

# Add cluster assignments to your spatial data
crime_map_data$cluster <- factor(crime_kmeans$cluster)
# Define a color palette for clusters
cluster_palette <- colorFactor(
  palette = c("red", "blue", "green", "purple"),
  domain = crime_map_data$cluster
)

# Create a Leaflet map showing crime clusters from K-means results
leaflet(crime_map_data) %>%
  addTiles() %>%  # Add default OpenStreetMap tiles
  addCircleMarkers(
    lng = ~long,
    lat = ~lat,
    color = ~cluster_palette(cluster),  # Color markers by cluster
    radius = 5,
    stroke = FALSE,
    fillOpacity = 0.6,
    popup = ~paste("Cluster:", cluster, "<br>Crime:", category, "<br>Street:", street_name)
  ) %>%
  setView(
    lng = mean(crime_map_data$long, na.rm = TRUE),
    lat = mean(crime_map_data$lat, na.rm = TRUE),
    zoom = 14
  ) %>%
  addLegend("bottomright",
            pal = cluster_palette,
            values = ~cluster,
            title = "Crime Clusters",
            opacity = 0.8)

The K-Means clustering plot illustrates how crime locations in Colchester are spatially grouped based on geographic coordinates. Using four cluster centers, the algorithm identifies areas where incidents are densely concentrated, assigning each to a specific color-coded cluster.

From the visualization, we can distinguish four prominent spatial clusters. Cluster 1 (red) is concentrated in the southwestern zone, Cluster 2 (blue) covers the eastern region, Cluster 3 (green) appears in the southern-central part, and Cluster 4 (purple) is focused in the northern stretch of the town center. This clear geographic separation suggests that different neighborhoods in Colchester are subject to distinct crime patterns.

These insights are valuable for law enforcement and urban planners. Cluster-specific trends could indicate hotspots for nightlife, commercial activity, or residential vulnerability. As a result, interventions—such as increased patrols, surveillance, or community outreach—can be strategically localized, allowing for more effective and efficient crime prevention efforts across the town.

Interactive Time Slider for Monthly Crime

# Group crime data by month and category, and count the number of incidents
crime_time <- crime24 %>%
  group_by(month_year = format(date, "%Y-%m"), category) %>%
  summarise(count = n(), .groups = 'drop')

# Create an interactive multi-line chart showing monthly crime trends per category
plot_ly(
  data = crime_time,
  x = ~month_year,                 # X-axis: month and year
  y = ~count,                      # Y-axis: number of crimes
  color = ~category,               # Color lines by crime category
  colors = viridis_pal(option = "D")(length(unique(crime_time$category))),  # Apply perceptually uniform palette
  type = 'scatter', mode = 'lines+markers'  # Line plot with markers
) %>%
  layout(
    title = "Interactive Monthly Crime by Category",
    xaxis = list(title = "Month"),
    yaxis = list(title = "Crime Count")
  )

The interactive chart shows monthly crime trends by category in 2024. Violent crime is the most frequent throughout the year, followed by anti-social behaviour and shoplifting. Some categories remain low and stable, such as bicycle theft and possession of weapons. Overall, the chart highlights seasonal fluctuations and helps identify which crime types dominate each month.

Animated Time Series with gganimate

# Prepare monthly crime data by summarising the number of incidents per category each month
crime_anim_data <- crime24 %>%
  mutate(month = format(date, "%Y-%m")) %>%
  group_by(month, category) %>%
  summarise(crime_count = n(), .groups = 'drop')

# Create a base ggplot object for animation
crime_anim_plot <- ggplot(crime_anim_data, aes(x = reorder(category, -crime_count), y = crime_count, fill = category)) +
  geom_col(show.legend = FALSE) +  # Use column chart without legend
  labs(
    title = 'Monthly Crime by Category: {closest_state}',  # Dynamic title showing current month
    x = 'Crime Category',
    y = 'Number of Crimes'
  ) +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1)) +  # Tilt x-axis labels for clarity
  transition_states(month, transition_length = 2, state_length = 1) +  # Animate across months
  ease_aes('cubic-in-out')  # Smooth easing for transitions

# Render the animation as a GIF (requires gifski package)
animate(crime_anim_plot, nframes = 100, fps = 10, renderer = gifski_renderer())

The animated bar chart visualizes how crime categories changed month-by-month throughout 2024. It clearly shows violent crime consistently dominating across all months, followed by anti-social behaviour and shoplifting. While the order of less frequent crimes shifts slightly, overall trends remain stable.

This animation provides a dynamic view of seasonal or monthly fluctuations in crime types, making it easier to spot peaks and patterns over time. It’s especially useful for identifying months with spikes in certain offences and tracking how priorities may shift throughout the year.

Animated Crime Hotspot Map (Over Time)

# Prepare spatial crime data with valid coordinates and a monthly timestamp
crime_map_anim <- crime24 %>%
  filter(!is.na(lat) & !is.na(long)) %>%
  mutate(month = format(date, "%Y-%m"))

# Identify the top 5 most frequent crime categories
top_categories <- crime_map_anim %>%
  count(category, sort = TRUE) %>%
  top_n(5) %>%
  pull(category)
## Selecting by n
# Filter the dataset to include only the top 5 crime categories
crime_map_anim <- crime_map_anim %>%
  filter(category %in% top_categories)

# Define custom color palette for selected crime categories
custom_colors <- c("red", "pink", "blue", "green", "black")

# Load UK map boundaries for base layer
uk_map <- map_data("world", region = "UK")

# Create an animated plot showing monthly changes in crime hotspots
map_anim_plot <- ggplot() +
  geom_polygon(data = uk_map, aes(x = long, y = lat, group = group),
               fill = "white", color = "gray80") +  # Base map with light outline
  geom_point(data = crime_map_anim, aes(x = long, y = lat, color = category),
             alpha = 0.7, size = 2) +  # Plot crime points with transparency
  scale_color_manual(values = custom_colors) +  # Apply custom colors
  coord_fixed(xlim = range(crime24$long), ylim = range(crime24$lat)) +  # Lock map aspect ratio to data range
  labs(
    title = "Monthly Crime Hotspots in Colchester: {closest_state}",
    subtitle = "Top 5 Crime Categories",
    x = "Longitude", y = "Latitude"
  ) +
  theme_minimal() +
  theme(legend.position = "right") +
  transition_states(month, transition_length = 2, state_length = 1) +  # Animate by month
  ease_aes("linear")  # Smooth linear transition

# Render the animation as a GIF
animate(map_anim_plot, nframes = 100, fps = 10, renderer = gifski_renderer())

The animated hotspot map showcases how the top five crime categories are spatially distributed across Colchester each month in 2024. Each colored point represents an incident from a leading category such as violent crime, anti-social behaviour, or shoplifting.

From this visualization, we observe that hotspots are consistently concentrated around central Colchester, with activity fluctuating across months. Violent crimes (black) appear frequently and are widely dispersed, while shoplifting (green) and public order offences (blue) are more centralized. The dynamic movement and density of these hotspots reveal temporal patterns that can aid law enforcement in allocating resources and planning patrols more effectively.

Animated Temperature Over Time

# Select relevant columns for animation (daily average temperature)
temp_anim <- temp24 %>%
  select(Date, TemperatureCAvg)

# Create a line plot of daily average temperature
temp_plot <- ggplot(temp_anim, aes(x = Date, y = TemperatureCAvg)) +
  geom_line(color = "purple", size = 0.7) +  # Plot in purple with moderate line thickness
  labs(
    title = "Daily Average Temperature in Colchester (2024)",
    x = "Date", y = "Temperature (°C)"
  ) +
  theme_bw() +  # Use classic black-and-white theme
  transition_reveal(Date)  # Animate reveal over time using the Date variable

# Render the animated plot as a GIF
animate(temp_plot, nframes = 100, fps = 10, renderer = gifski_renderer())

The animated line plot displays the daily average temperature in Colchester throughout 2024. It clearly reflects the expected seasonal pattern—temperatures rise steadily from winter to summer, peaking around July and August, before declining again toward the end of the year.

This smooth fluctuation highlights Colchester’s temperate climate, with noticeable warmth in mid-year and cooler conditions during the early and late months. The animation makes it easy to observe short-term spikes or drops in temperature, possibly linked to brief weather events.

Summary and Recommendations

Summary of Findings

This project examined whether weather conditions—specifically temperature, humidity, precipitation, and pressure—influenced crime patterns in Colchester during 2024. Exploratory analysis of both crime and climate data revealed several notable associations.

Seasonal trends showed that crime peaked during warmer months, particularly summer and early autumn, with offences like anti-social behaviour and public disorder more common during these periods. This suggests that warmer weather may indirectly contribute to higher crime through increased outdoor activity and social interaction. Conversely, colder and wetter months saw slightly reduced crime, consistent with the idea that poor weather discourages public presence and opportunistic behaviour.

Among weather variables, temperature had the strongest association with total crime, while humidity and pressure showed weaker, more category-specific links. Rainfall had minimal effect on overall crime, though certain offences showed mild sensitivity to precipitation.

Spatial analysis identified persistent hotspots around central Colchester, especially near nightlife and commercial areas. Clustering analysis revealed distinct crime zones, offering useful insights for targeted resource deployment. Animated and interactive visualisations further highlighted temporal and spatial shifts in crime.

Recommendations

Based on these findings, several recommendations are proposed. Policing efforts could be adjusted seasonally, with increased patrols during warmer months and around hotspot areas. Integrating weather data into early-warning systems may also support proactive crime prevention. Additionally, spatial clustering results could inform strategic CCTV placement and community engagement initiatives.

Future Work

Future research could build on this project by incorporating additional environmental factors such as wind speed, cloud cover, and visibility, or by examining crime patterns by time of day. Applying predictive modelling techniques—such as logistic regression or machine learning—may improve forecasting of high-risk periods based on weather trends. Comparative studies across other towns or cities could also reveal whether similar weather-crime relationships hold in different urban contexts.

Conclusion

This study demonstrates that weather conditions have a measurable, though modest, impact on crime patterns in Colchester. Integrating environmental insights into crime analysis can support more informed, responsive policing and long-term public safety planning.

References